Welcome to your introduction to network analysis. In this session you will learn:

  1. Why applying network analysis is helpful to answer certain questions, and why framing certain contexts as networks gives new insights.
  2. The basic structure of relational data.
  3. How to construct graph objects from different datasources.
  4. How to analyse basic features of nodes, edges, and graphs.
  5. How to identify groups and communities in graphs.
  6. How to do simple network visualizations.

Introduction

So what?

So, before we talk about networks, one thing upfront… why should we? I mean, they undeniably look pretty, don’t they? Somehow, the visualization of networks fascinates the human mind (find a short TED talk on networks and how they depict our world here), and has even inspired an own art movement, networkism (see some examples here).

Yet, besides that, is there an analytical value for a data scientist to bother about networks?

The basic jargon

First of all, what is a network? Plainly speaking, a network is a system of elements which are connected by some relationship. The vocabulary can be a bit technical and even inconsistent between different disciplines, packages, and software.

The whole system is (surprise, surprise) usually called a network or graph. The elements are commonly referred to as nodes (system theory jargon) or vertices (graph theory jargon) of a graph, while the connections are edges or links. I will mostly refer to the elements as nodes, and their connections as edges.

Generally, networks are a form of representing relational data. This is a very general tool that can be applied to many different types of relationships between all kind of elements. The content, meaning, and interpretation for sure depends on what elements we display, and which types of relationships. For example:

  • In Social Network Analysis:
    • Nodes represent actors (which can be persons, firms and other socially constructed entities)
    • Edges represent relationships between this actors (friendship, interaction, co-affiliation, similarity ect.)
  • Other types of network
    • Chemistry: Interaction between molecules
    • Computer Science: The wirld-wide-web, inter- and intranet topologies
    • Biology: Food-web, ant-hives

The possibilities to depict relational data are manifold. For example:

  • Relations among persons
    • Kinship: mother of, wife of…
    • Other role based: boss of, supervisor of…
    • Affective: likes, trusts…
    • Interaction: give advice, talks to, retweets…
    • Affiliation: belong to same clubs, shares same interests…
  • Relations among organizations
    • As corporate entities, joint ventures, strategic alliances
    • Buy from / sell to, leases to, outsources to
    • Owns shares of, subsidiary of
    • Via their members (Personnel flows, friendship…)

Why it is useful?

Relational data-structures

Edgelist

Most real world relational data is to be found in what we call an edge list, a dataframe that contains a minimum of two columns, one column of nodes that are the source of a connection and another column of nodes that are the target of the connection. The nodes in the data are identified by unique IDs.

If the distinction between source and target is meaningful, the network is directed. If the distinction is not meaningful, the network is undirected (more on that later). So, every row that contains the ID of one element in column 1, and the ID of another element in column 2 indicates that a connection between them exists.

An edge list can also contain additional columns that describe attributes of the edges such as a magnitude aspect for an edge. If the edges have a magnitude attribute the graph is considered weighted (e.g., number of interactions, strenght of friendship).

Below an example ofa minimal edge list created with the tibble() function. In this case, let us assume this network to be unweighted, meaning a connection can be eiter tresent or absent.

edge_list <- tibble(from = c(1, 2, 2, 1, 4), 
                    to = c(2, 3, 4, 5, 1))

edge_list

Adjacency Matrix

A second popular form of network representation is the adjacency-matrix (also called socio-matrix). It is represented as a \(n*n\) matrix, where \(n\) stands for the number of elements of which their relationships should be represented. The value in the cell that intercepts row \(n\) and column \(m\) indicates if an edge is present (=1) or absent (=0).

Tip: Given an edgelist, an adjacency matrix can easily be produced by crosstabulating:

adj_matrix <- edge_list %>%
  table() %>% 
  as.matrix()

adj_matrix
##     to
## from 1 2 3 4 5
##    1 0 1 0 0 1
##    2 0 0 1 1 0
##    4 1 0 0 0 0

Note: Existing as well as not existing connections are stored. Since most networks in reality are sparse (= more potential connections are inactive than active), this is inneficient for storrage and computation. Here, a dgCMatrix object from the Matrixcan be helpful.

library(Matrix)
sparse_matrix <- edge_list %>%
  table() %>% 
  Matrix(sparse = TRUE)

sparse_matrix
## 3 x 5 sparse Matrix of class "dgCMatrix"
##     to
## from 1 2 3 4 5
##    1 . 1 . . 1
##    2 . . 1 1 .
##    4 1 . . . .

This sparse datasructure only stores a reference to non-empty cells and their values.

sparse_matrix %>% str()
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:5] 2 0 1 1 0
##   ..@ p       : int [1:6] 0 1 2 3 4 5
##   ..@ Dim     : int [1:2] 3 5
##   ..@ Dimnames:List of 2
##   .. ..$ from: chr [1:3] "1" "2" "4"
##   .. ..$ to  : chr [1:5] "1" "2" "3" "4" ...
##   ..@ x       : num [1:5] 1 1 1 1 1
##   ..@ factors : list()

Nodelists

Edgelists as well as adjacency matrices only stores connectivity pattern between nodes, but due to their structure cannot store informations on the nodes in which we might be interested. Therefore, we in many cases also provide a a node list with these informations (such as the names of the nodes or any kind of groupings).

node_list <- tibble(id = 1:5, 
                    name = c("Jesper", "Pernille", "Jacob", "Dorte", "Donald"),
                    gender = c("M", "F", "M", "F", "M"),
                    group = c("A", "B", "B", "A", "C"))
node_list

Graph Objects

Up to now we see that relatonal data, and the analysis thereof, has some particularities, making it distinct from tabular data (e.g., dataframes), we usually work with.

  • Tabular data
    • In tabular data, summary statistics of variables are between observations (column-wise) interdependent, meaning changing a value of some observation will change the corresponding variables summary statistics.
    • LIkewise, variable values might be within observation interdependent (row-wise), meaning changing a variable value might change summary statistics of the observation
    • Otherwise, values are (at least mathematically) independent.
  • Graph data
    • Same holds true, but adittional interdependencies due to the relational structure of the data.
    • Sepperation between node and edge data, which is interdependent. Removing a node might alos impy the removal of edges, removal of edges changes the characteristics of nodes
    • In adittion, the relational structure makes that not only true for adjacent nodes and edges, but potentially multiple. Adding/Removing one node/edge could change the characteristics of every single other node/edge.
    • That is less of a problem for local network characteristics (eg., a node’s degree on level 1). However, many node and edge characteristics such
    • That’s mainly why graph computing is slightly more messy, and need own mathematical tools, and applications from graphical computing (graphical like graph, not like figure)

Therefore, network analysis packages in R, Python, and elsewhere usually define own graph objects (containing information on nodes as well as edges), in which network data for further analysis is stored.

Graph objects in igraph

One of the most popular network/graph analytics framework in R and Python alike is igraph. It provides a powerful toolbox for analysis as well as plotting alike. Lets take a peak.

To create an igraph object from an edge-list data frame we can use the graph_from_data_frame() function, which is a bit more straight forward than network(). There are three arguments in the graph_from_data_frame() function: d, vertices, and directed. Here, d refers to the edge list, vertices to the node list, and directed can be either TRUE or FALSE depending on whether the data is directed or undirected. By default, graph.data.frame() treats the first two columns of the edge list and any remaining columns as edge attributes.

library(igraph)
g <- graph_from_data_frame(d = edge_list, vertices = node_list, directed = FALSE)
# g <- graph_from_adjacency_matrix(adj_matrix, mode = "undirected") # Same for the adjacency matrix
g
## IGRAPH a3d615a UN-- 5 5 -- 
## + attr: name (v/c), gender (v/c), group (v/c)
## + edges from a3d615a (vertex names):
## [1] Jesper  --Pernille Pernille--Jacob    Pernille--Dorte   
## [4] Jesper  --Donald   Jesper  --Dorte

Lets inspect the resulting object. An igraph graph object summary reveals some interesting informations.

  • First, it tells us the graph-type: undirected UN, or directed DN
  • Afterwards, the number of nodes (4), and edges (5)
  • Followed by the node attributes (node level variables), which in this case are only their name, gender, and group (attr: name (v/c), gender (v/c), group (v/c))
  • Lastly, a list of all existing edges. Note: n--m indicates an undirected, n->m an directed edge.

Lets take a look at the structure of the object:

g[[1:2]]%>% glimpse() # Note the double brackets (g is a list object)
## List of 2
##  $ Jesper  : 'igraph.vs' Named int [1:3] 2 4 5
##   ..- attr(*, "names")= chr [1:3] "Pernille" "Dorte" "Donald"
##   ..- attr(*, "env")=<weakref> 
##   ..- attr(*, "graph")= chr "a3d615af-e5b9-11e9-a70b-3f25fafed331"
##  $ Pernille: 'igraph.vs' Named int [1:3] 1 3 4
##   ..- attr(*, "names")= chr [1:3] "Jesper" "Jacob" "Dorte"
##   ..- attr(*, "env")=<weakref> 
##   ..- attr(*, "graph")= chr "a3d615af-e5b9-11e9-a70b-3f25fafed331"

We see, the object has a list-format, consisting of sepperate lists for every node, containing some attributes which are irrelevant now, and an edgelist for every node, capturing its ego-network (eg., ..$ Jesper: 'igraph.vs' Named int [1:3] 2 4 5)

We can also plot it to take a look. igraph object can be directly used with the plot() function. The results can be adjusted with a set of parameters we will discover later. It’s not super pretty, therefore we will later also explore more powerfull plotting tools for rgaphs. However, its quick&dirty, so lets take it like that for now.

plot(g)

We can inspect and manipulate the nodes via V(g) (V for vertices, its graph-theory slang), and edges with E(g)

V(g)
## + 5/5 vertices, named, from a3d615a:
## [1] Jesper   Pernille Jacob    Dorte    Donald
E(g)
## + 5/5 edges from a3d615a (vertex names):
## [1] Jesper  --Pernille Pernille--Jacob    Pernille--Dorte   
## [4] Jesper  --Donald   Jesper  --Dorte

We can also use most of the base-R slicing&dicing.

V(g)[1:3]
## + 3/5 vertices, named, from a3d615a:
## [1] Jesper   Pernille Jacob
E(g)[2:4]
## + 3/5 edges from a3d615a (vertex names):
## [1] Pernille--Jacob  Pernille--Dorte  Jesper  --Donald

Remember, it’s a list-object. So, if we just want to have the values, we have to use the double bracket [[x]].

V(g)[[1:3]]
## + 3/5 vertices, named, from a3d615a:
##       name gender group
## 1   Jesper      M     A
## 2 Pernille      F     B
## 3    Jacob      M     B

We can also use the $ notation.

V(g)$name
## [1] "Jesper"   "Pernille" "Jacob"    "Dorte"    "Donald"

There is obviously a lot more to say about igraph and its rich functionality. You will learn much of the abse functionality of igraph in your DC assignments. Furthermore Katya Ognyanova, has a brilliant tutorial that can be studied.

Graph objects in tidygraph

While the igraph functionality still represents the core of R’s network analysis toolbox, recent developments have made network analytics much more accessible and intuitive.

Thomas Lin Pedersen (also known as the developer of beloved packages like ggforce, gganimate, and the R implementation of lime) has recently released the tidygraph package that leverage the power of igraph in a manner consistent with the tidyverse workflow. It represents a lightweight wrapper around the core igraph object and functionality which makes it accessible for much of the traditional dplyr workflows. Even better, he tops it up with ggraph, a consistent ggplot2-look-and-feel network visualization package.

For that reason, we will mostly work with the tidygraph framework, while we still in some few cases need to draw from the base igraph functionality. Lets take a peak.

Creating atbl_graph

We here created the tbl_graph directly from the igraph object.

g  %<>% as_tbl_graph()
g
## # A tbl_graph: 5 nodes and 5 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 5 x 3 (active)
##   name     gender group
##   <chr>    <chr>  <chr>
## 1 Jesper   M      A    
## 2 Pernille F      B    
## 3 Jacob    M      B    
## 4 Dorte    F      A    
## 5 Donald   M      C    
## #
## # Edge Data: 5 x 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     2     3
## 3     2     4
## # ... with 2 more rows

We see a more intuitive representation of the datastructure, consisting of a node as well as an edge dataframe.

We could for sure also create it based on our initial node- and edgelist.

g <- tbl_graph(edges = edge_list, nodes = node_list, directed = FALSE)

Note: The tbl_graph class is a thin wrapper around an igraph object that provides methods for manipulating the graph using the tidy API. As it is just a subclass of igraph every igraph method and its syntax will work as expected and can be used if necessary.

V(g)
## + 5/5 vertices, named, from a44299a:
## [1] Jesper   Pernille Jacob    Dorte    Donald

In adittionan, the as_tbl_graph() function also can transform different types of network data from objects such as data.frame, matrix, dendrogram, igraph, etc.

Acessing and manipulating nodes and edges

But how can a graph object be manipulated with dplyr syntax? We know that a graph object contains an edge as well as node dataframe, so commands like g %>% filter(name == "Pernille") would be confusing, since it is unclear if we refer to nodes or edges. tidygraph’s solution here are selective activation pipes:

  • %N>% activates nodes
  • %E>% activates edges

Consequently, functions are executed on the currently active dataframe of either nodes or edges. With this simple syntax trick, graphs become subject to most commonly known data manipulation workflows for tabular data.

g %N>%
  filter(gender == "F")
## # A tbl_graph: 2 nodes and 1 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 2 x 4 (active)
##      id name     gender group
##   <int> <chr>    <chr>  <chr>
## 1     2 Pernille F      B    
## 2     4 Dorte    F      A    
## #
## # Edge Data: 1 x 2
##    from    to
##   <int> <int>
## 1     1     2

Note that filtering nodes will simultaneously result in a filtering of edges. We for sure can also do manipulatings on both nodes and edges in one pipeline.

g %N>%
  filter(group %in% c("A", "B")) %E>%
  filter(to == 2)
## # A tbl_graph: 4 nodes and 1 edges
## #
## # An undirected simple graph with 3 components
## #
## # Edge Data: 1 x 2 (active)
##    from    to
##   <int> <int>
## 1     1     2
## #
## # Node Data: 4 x 4
##      id name     gender group
##   <int> <chr>    <chr>  <chr>
## 1     1 Jesper   M      A    
## 2     2 Pernille F      B    
## 3     3 Jacob    M      B    
## # ... with 1 more row

Note that the filtering of edges did not reduce the nodeset. While nodes can be isolated in a nework, edges without an adjacent node cannot exist.

We can also pull the virtual node or edge dataframe out of the tbl_graph and use it for tabular analysis.

g %N>%
  filter(group == "B") %>%
  as_tibble()

One last thing for now: While igraph also provides a powerful network visualization functionality, I will also mostly go with Thomas sister package, ggraph, which provides a network visualization interface compatible and consistent with ggplot2

The rest works like any ggplot2 function call, just that we use special geoms for our network, like geom_edge_density() to draw a shadow where the edge density is higher, geom_edge_link() to connect edges with a straight line, geom_node_point() to draw node points and geom_node_text() to draw the labels. More options can be found here.

library(ggraph)
g %>% ggraph(layout = 'nicely') + 
  geom_edge_link() + 
  geom_node_point() + 
  geom_node_text(aes(label = name))

Not very impressive up to now, but wait for the real stuff to come…

Network analysis and measures

While being able to use the dplyr verbs on relational data is nice and all, one of the reasons we are dealing with graph data in the first place is because we need some graph-based algorithms for solving our problem at hand. If we need to break out of the tidy workflow every time this was needed we wouldn’t have gained much. Because of this tidygraph has wrapped more or less all of igraphs algorithms in different ways, ensuring a consistent syntax as well as output that fits into the tidy workflow. In the following we’re going to take a look at these.

Central to all of these functions is that they know about which graph is being computed on (in the same way that n() knows about which tibble is currently in scope). Furthermore they always return results matching the node or edge position so they can be used directly in mutate() calls.

Node-Level measures

Often, we are interested in ways to summarize the pattern of node connectivity to infer something on their characteristics.

Lets create some example graph on which we will illustrate some of teh most popular ones.

# generate a sample network: play_smallworld() Create graphs based on the Watts-Strogatz small- world model.
set.seed(1234)
g <- play_barabasi_albert(n = 200, # Number of nodes
                          power = 0.75, # Power of preferential attachment effect
                          directed = FALSE # Undirected network
                          )

# # You can also give it a try with another network structure
# g <- play_smallworld(n_dim = 1, # Number of dimensions (more on that later)
#                      dim_size = 100, # Number of nodes 
#                      order = 3, # The neighborhood size to create connections from
#                      p_rewire = 0.05 # The rewiring probability of edges
#                      ) 
g %>%
    ggraph(layout = "fr") + 
    geom_edge_link() + 
    geom_node_point() + 
    theme_graph() # Adding `theme_graph()` introduces a stileguide better suited for rgaphs

Centralities

One of the simplest concepts when computing node level measures is that of centrality, i.e. how central is a node or edge in the graph. As this definition is inherently vague, a lot of different centrality scores exists that all treat the concept of “central” a bit different.

We in the following well briefly illustrate the idea behind three of the most popular centrality measures, namely:

  • Degree centrality
  • Eigenvector centrality
  • Betweenness centrality
g <- g %N>%
  mutate(centrality_dgr = centrality_degree(),
         centrality_eigen = centrality_eigen(),
         centrality_between = centrality_betweenness()) 
g %N>%
  as_tibble() %>% 
  head()

tidygraph currently has 11 different centrality (base igraph even more) measures and all of these are prefixed with centrality_* for easy discoverability. All of them returns a numeric vector matching the nodes (or edges in the case of centrality_edge_betweenness()).

Degree centrality

The degree centrality is probably the most intuitive node measure, which basically just counts the number of edges adjacent to a node. Formally, the degree of node \(i\) is the number of existing edges \(e_{ij}\) with other nodes \(j\) in a network with \(n\) nodes:

\[d_{ij} =\sum\limits_{j=1}^{n} e_{ij} ~ where: ~ i \neq j\]

g %>%
    ggraph(layout = "fr") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_dgr, colour = centrality_dgr)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

Eigenvector centrality

Similar to the degree centrality, the eigenvector centrality takes this idea of characterizing nodes by their importance in a network a step further. It also represents the main idea behind the pagerank algorithm that was powering Google Search in the beginning.

The basic idea is to weight a node’s degree centrality by the centrality of the nodes adjacent to it (and their centrality in turn by their centrality). This will make nodes connected to in turn also well connected nodes more important. The eigenvector here is just a clever mathematical trick to solve such a recurrent problem.

g %>%
    ggraph(layout = "fr") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_eigen, colour = centrality_eigen)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

Betweenness centrality

g %>%
    ggraph(layout = "fr") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_between, colour = centrality_between)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

Clustering (Community detection)

Another common operation is to group nodes based on the graph topology, sometimes referred to as community detection based on its commonality in social network analysis.

All clustering algorithms from igraph are available in tidygraph using the group_* prefix. All of these functions return an integer vector with nodes (or edges) sharing the same integer being grouped together.

# We create an example network
g <- play_islands(n_islands = 5, #  The number of densely connected islands
                  size_islands = 15, # The number of nodes in each island
                  p_within = 0.75, # The probability of edges within and between groups/blocks
                  m_between = 5 # The number of edges between groups/islands
                  ) 
# As planned, we clearely see distinct communities
g %>% 
    ggraph(layout = 'kk') + 
    geom_edge_link() + 
    geom_node_point(size = 7) + 
    theme_graph()

# We run a community detection simply with the group_* function of tidygraph. here, the Lovain algorithm is a well performing and fast choice.
g <- g %N>% 
    mutate(community = group_louvain() %>% as.factor()) 
# Lets see how well it did...
g %>% 
    ggraph(layout = 'kk') + 
    geom_edge_link() + 
    geom_node_point(aes(colour = community), size = 7) + 
    theme_graph()

Network level measures

Your turn

Please do Exercise 1 in the corresponding section on Github.

Case: Networks are coming…

So, lets get serious. Appropriate for the weather these days in Denmark, the theme is “winter is comming…”. Therefore, we will have some fun analysing the Game of Thrones data provided by Andrew Beveridge. It is a Character Interaction Networks for George R. R. Martin’s “A Song of Ice and Fire” saga (yes, we are talking about the books…). These networks were created by connecting two characters whenever their names (or nicknames) appeared within 15 words of one another in one of the books in “A Song of Ice and Fire.” The edge weight corresponds to the number of interactions.

This is a nice skill you will have after the second part of M2 on your own.

Build the graph

First, we load all nodes, representing all characters appearing in the books:

edges <- read_csv("https://www.dropbox.com/s/l8v3if1271nu8yx/asoiaf-all-edges.csv?dl=1") 
colnames(edges) <- tolower(colnames(edges))
edges %>% head()

So, that’s what we have, a classical edgelist, with id1 in column 1 and id2 in column2. Note, the edges are in this case weighted.

Ok, lets see how many characters we have overal.

n_distinct(c(edges$source, edges$target))
## [1] 796

Because there are so many characters in the books, many of them minor, I am subsetting the data to the 100 characters with the most interactions across all books. The edges are undirected, therefore there are no redundant Source-Target combinations; because of this, I gathered Source and Target data before summing up the weights.

chars_main <- edges %>%
  select(-type) %>%
  gather(x, name, source:target) %>%
  group_by(name) %>%
  summarise(sum_weight = sum(weight)) %>%
  ungroup() %>%
  arrange(desc(sum_weight)) %>%
  slice(1:100)

head(chars_main)

So far so good, if we only go by edge weights, Seems as if Tyrion might make it…. my favorite anyhow. Lets reduce our edgelist to this main characters, just to warm up and keep the overview.

edges %<>%
  filter(source %in% chars_main$name & target %in% chars_main$name) %>%
  select(source, target, weight) %>%
  rename(from = source,
         to = target)
# Note: Since it is small data, this way with %in% is ok. However, with large datasets I would filter via semi_join() instead (more efficient)

Now we can convert our edgelist into a tbl_graph object structure.

g <- edges %>% as_tbl_graph(directed = FALSE)

g
## # A tbl_graph: 100 nodes and 798 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 100 x 1 (active)
##   name                           
##   <chr>                          
## 1 Aemon-Targaryen-(Maester-Aemon)
## 2 Aeron-Greyjoy                  
## 3 Aerys-II-Targaryen             
## 4 Alliser-Thorne                 
## 5 Arianne-Martell                
## 6 Arya-Stark                     
## # ... with 94 more rows
## #
## # Edge Data: 798 x 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     4      7
## 2     1    13      4
## 3     1    28      3
## # ... with 795 more rows

We can use some of the tidygraph helpers to briefly clean the graph. Check ?node_is_* and ?edge_is_* for options.

# Filtering out multiple edges and isolated nodes (unconnected), in case there are some
g <- g %E>%
  filter(!edge_is_multiple()) %N>%
  filter(!node_is_isolated()) 

Note that the edges in this graph are weighted. We can briefly look at the weight distribution:

g %E>%
  as_tibble() %>%
  ggplot(aes(x = weight)) +
  geom_histogram()

We see a right skewed distribution with many weak and some very strong edges. Lets take a look what are the edges with the highest weight (meaning here: the characters with most intraction).

g %E>%
  as_tibble() %>%
  arrange(desc(weight)) %>%
  head()

tidygraph always uses numeric IDs for nodes, which are also labeling the edges. This is not very helpful to get insights. So, lets take the node names in instead.

# We access the nodes directly via .N(). The same can be done for edges with .E() and the graph with .G(). Check ?context_accessors for more infos
g %E>%
  mutate(name_from = .N()$name[from],
         name_to = .N()$name[to]) %>%
  as_tibble() %>%
  select(name_from, name_to, weight) %>%
  arrange(desc(weight)) %>%
  head()

Node Characteristics

g <- g %N>%
  mutate(centrality_dgr = centrality_degree(weights = weight),
         centrality_eigen = centrality_eigen(weights = weight),
         centrality_between = centrality_betweenness(weights = weight)) 
bind_cols(g %N>%
            select(name, centrality_dgr) %>%
            arrange(desc(centrality_dgr)) %>%
            as_tibble(),
          g %N>%
            select(name, centrality_eigen) %>%
            arrange(desc(centrality_eigen)) %>%
            as_tibble(),
          g %N>%
            select(name, centrality_between) %>%
            arrange(desc(centrality_between)) %>%
            as_tibble()) %>%
  mutate_if(is.numeric, round, 1) %>%
  head()

Communities & Groups

g <- g %N>% 
    mutate(community = group_louvain() %>% as.factor()) 
g %N>%
  select(name, community, centrality_dgr) %>%
  as_tibble() %>% 
  arrange(community, desc(centrality_dgr)) %>%
  group_by(community) %>%
  slice(1:5) %>% mutate(n = 1:5) %>%
  ungroup() %>%
  select(-centrality_dgr) %>%
  spread(community, name)

Network Visualization I

g %>% ggraph(layout = "fr") + 
    geom_edge_link() + 
    geom_node_point() +
  geom_node_text(aes(label = name)) 

g %E>% 
  filter(weight >= quantile(weight, 0.5)) %N>%
  filter(!node_is_isolated()) %>%
  ggraph(layout = "fr") + 
    geom_edge_link(aes(width = weight), alpha = 0.2) + 
    geom_node_point(aes(color = community, size = centrality_eigen)) +
    geom_node_text(aes(label = name, size = centrality_eigen), repel = TRUE) +
    scale_color_brewer(palette = "Set1") +
    theme_graph() +
    labs(title = "A Song of Ice and Fire character network",
         subtitle = "Nodes are colored by community")

Your turn

Please do Exercise 2 in the corresponding section on Github.

More info

You can find more info about:

  • tidygraph here
  • ggraph here
  • A Datacamp Python project for the same data set here
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Matrix_1.2-17   ggraph_1.0.2    tidygraph_1.1.2 igraph_1.2.4.1 
##  [5] magrittr_1.5    forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
##  [9] purrr_0.3.2     readr_1.3.1     tidyr_0.8.3     tibble_2.1.3   
## [13] ggplot2_3.2.1   tidyverse_1.2.1 pacman_0.5.1    knitr_1.24     
## 
## loaded via a namespace (and not attached):
##  [1] ggrepel_0.8.1      Rcpp_1.0.2         lubridate_1.7.4   
##  [4] lattice_0.20-38    assertthat_0.2.1   zeallot_0.1.0     
##  [7] digest_0.6.20      utf8_1.1.4         ggforce_0.3.0     
## [10] R6_2.4.0           cellranger_1.1.0   plyr_1.8.4        
## [13] backports_1.1.4    evaluate_0.14      httr_1.4.1        
## [16] pillar_1.4.2       rlang_0.4.0        lazyeval_0.2.2    
## [19] curl_4.0           readxl_1.3.1       rstudioapi_0.10   
## [22] rmarkdown_1.14.3   labeling_0.3       polyclip_1.10-0   
## [25] munsell_0.5.0      broom_0.5.2        compiler_3.6.1    
## [28] modelr_0.1.5       xfun_0.8           pkgconfig_2.0.2   
## [31] htmltools_0.3.6    tidyselect_0.2.5   gridExtra_2.3     
## [34] fansi_0.4.0        viridisLite_0.3.0  crayon_1.3.4      
## [37] withr_2.1.2        MASS_7.3-51.4      grid_3.6.1        
## [40] nlme_3.1-141       jsonlite_1.6       gtable_0.3.0      
## [43] scales_1.0.0       cli_1.1.0          stringi_1.4.3     
## [46] farver_1.1.0       viridis_0.5.1      xml2_1.2.2        
## [49] vctrs_0.2.0        generics_0.0.2     RColorBrewer_1.1-2
## [52] tools_3.6.1        glue_1.3.1         tweenr_1.0.1      
## [55] hms_0.5.0          yaml_2.2.0         colorspace_1.4-1  
## [58] rvest_0.3.4        haven_2.1.1